Restructuring Multilingual Web Sites

نویسندگان

  • Paolo Tonella
  • Filippo Ricca
  • Emanuele Pianta
  • Christian Girardi
چکیده

Current practice of Web site development does not address explicitly the problems related to multilingual sites. The same information, as well as the same navigation paths, page formatting and organization, are expected to be provided by the site independently from the chosen language. This is typically ensured by adopting personal conventions on the way pages are named and on their location in the file system. Updates are then performed manually and consistency depends on the ability of the programmers not to miss any impact of the change. In this paper an extension to XHTML, called MLHTML (MultiLingual XHTML), is proposed as the target representation of a restructuring process aimed at producing a maintainable and consistent multilingual Web site. MLHTML centralizes the language dependent variants of a page in a single representation, where shared parts are not duplicated. Existing sites can be migrated to MLHTML by means of the algorithms described in this paper. After classifying the pages according to their language, a page alignment technique is exploited to identify corresponding pages and to eliminate inconsistencies. Transformation into MLHTML can then be achieved automatically.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

EuroGOV: Engineering a Multilingual Web Corpus

EuroGOV is a multilingual web corpus that was created to serve as the document collection for WebCLEF, the CLEF 2005 web retrieval task. EuroGOV is a collection of web pages crawled from the European Union portal, European Union member state governmental web sites, and Russian government web sites. The corpus contains over 3 million documents written in more than 20 different European languages...

متن کامل

Towards a Multilingual Ontology for Ontology-driven Content Mining in Social Web Sites

Social Semantic Web aims at combining approaches and technologies from both Social and Semantic Web. While Social Web sites provide a rich source of unstructured information, what makes its automatic processing very limited, Semantic Web aims at giving a welldefined meaning to the Web information, facilitating its sharing and processing. Multilinguality is an emergent aspect to be considered in...

متن کامل

A Model of Versioned Web Sites

In this paper we present a model of versioned web sites which is aimed at building a web site configuration. The web site configuration is a consistent version of the web site and serves for navigation purposes. We exploit the fact that the versioning of web sites is in many aspects similar to versioning of software systems (and their components). On the other hand, specific characteristics rel...

متن کامل

Adaptive, Multilingual Named Entity Recognition in Web Pages

Most of the information on the Web today is in the form of HTML documents, which are designed for presentation purposes and not for machine understanding and reasoning. Existing web extraction systems require a lot of human involvement for maintenance due to changes to targeted web sites and for adaptation to new web sites or even to new domains. This paper presents the adaptive, multilingual n...

متن کامل

Building a Social Media Digital Library: Collection, Management, and Analytics

In this talk I will present the University of Arizona Artificial Intelligence Lab’s recent research in Dark Web, Geopolitical Web, and Business Analytics. Based on funding from the NSF and several other US agencies, the AI Lab has developed techniques for collecting, managing and analyzing largescale multilingual and multimedia social media contents of relevance to social, geopolitical, and bus...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2002